A Polynesian-specific copy number variant encompassing the MICA gene associates with gout

Abstract Gout is of particularly high prevalence in the Māori and Pacific (Polynesian) populations of Aotearoa New Zealand (NZ). Here, we investigated the contribution of common population-specific copy number variation (CNV) to gout in the Aotearoa NZ Polynesian population. Microarray-generated genome-wide genotype data from Aotearoa NZ Polynesian individuals with (n = 1196) and without (n = 1249) gout were analyzed. Comparator population groups were 552 individuals of European ancestry and 1962 of Han Chinese ancestry. Levels of circulating major histocompatibility complex (MHC) class I polypeptide-related sequence A (MICA) were measured by enzyme-linked immunosorbent assay. Fifty-four CNV regions (CNVRs) appearing in at least 10 individuals were detected, of which seven common (>2%) CNVRs were specific to or amplified in Polynesian people. A burden test of these seven revealed associations of insertion/deletion with gout (odds ratio (OR) 95% confidence interval [CI] = 1.80 [1.01; 3.22], P = 0.046). Individually testing of the seven CNVRs for association with gout revealed nominal association of CNVR1 with gout in Western Polynesian (Chr6: 31.36–31.45 Mb, OR = 1.72 [1.03; 2.92], P = 0.04), CNVR6 in the meta-analyzed Polynesian sample sets (Chr1: 196.75–196.92 Mb, OR = 1.86 [1.16; 3.00], P = 0.01) and CNVR9 in Western Polynesian (Chr1: 189.35–189.54 Mb, OR = 2.75 [1.15; 7.13], P = 0.03). Analysis of European gout genetic association data demonstrated a signal of association at the CNVR1 locus that was an expression quantitative trait locus for MICA. The most common CNVR (CNVR1) includes deletion of the MICA gene, encoding an immunomodulatory protein. Expression of MICA was reduced in the serum of individuals with the deletion. In summary, we provide evidence for the association of CNVR1 containing MICA with gout in Polynesian people, implicating class I MHC-mediated antigen presentation in gout.


Introduction
Gout is a common complex metabolic and inf lammatory disease caused by hyperuricemia and immune response to the deposition of monosodium urate (MSU) crystals in and around body tissues, particularly joints (1). The clinical features consist of intermittent gout flares, chronic gouty arthritis, tophi and increased prevalence of comorbidities that include hypertension, type 2 diabetes mellitus, chronic kidney disease, dyslipidemia and cardiovascular disease (1)(2)(3)(4)(5). The pathogenesis of gout requires progression through several phases: from hyperuricemia to the deposition of MSU crystals to subsequent immune response to these crystals (6). In some individuals, the crystals cause a toll-like receptor-mediated formation and activation of the NLRP3 inf lammasome in monocytes and production of interleukin-1β that drives the gout flare.
Genome-wide association studies (GWAS) that focus on single-nucleotide variants (SNVs) have provided considerable insight into the molecular pathogenesis of hyperuricemia (7)(8)(9), although GWAS in gout have been more limited (10). However, common SNVs explain only a relatively small fraction of the heritability of gout, with genome-wide heritability estimates of around 0.3 (11). Structural variants are defined as genomic rearrangements affecting >50 bp of sequence, and they impact a greater proportion of the genome than SNVs. Therefore, they have a greater likelihood of having an impact on molecular function and phenotype. Structural variants are firmly implicated in common disease including schizophrenia (12), Alzheimer's disease (13), intellectual disabilities (14), autism (15,16), Crohn's disease (17,18) and rheumatic diseases (19)(20)(21). In gout and hyperuricemia, single small studies have reported that copy number variations (CNVs) of ABCF1, IL17REL and FCGR3A are associated with the risk of gout in the Chinese population (22), and CNV upstream of SLC2A9 associates with urate levels in Europeans (23), suggesting a role for genomic structural variation in the pathogenesis of gout.
The prevalence of gout among Māori (the indigenous people of Aotearoa New Zealand) and New Zealand Pacific adults is 8% and 14%, respectively, compared to 4% for non-Māori and non-Pacific people (24). While structural inequities contribute to increased prevalence and poorer outcomes (25), research on populationspecific genetic variants associated with gout is important to provide insights into the disparities between different populations, especially in populations currently under-served with respect to participation in genetic studies. It is becoming clear that genetic variation that is specific to Pacific populations does exist and that such variation contributes to the pathogenesis of gout (26)(27)(28) and other metabolic diseases (29). Our primary hypothesis was that there are Polynesian-specific (or amplified) CNV regions (CNVRs) associated with gout and which may contribute to the increased prevalence of gout in Polynesian populations. Therefore, we aimed to identify Polynesian-specific CNVs contributing to gout using genome-wide genotype data in a total of 1196 people with and 1249 people without gout.

Discovery and characteristics of the CNVs
Aotearoa NZ Polynesian and European participants that had CNV count ≤312 (85th percentile) were used for CNV identification and analysis. Among 4353 CNV calls detected after the quality control filtering, 2607 (1427 duplications and 1180 deletions) were found in people with gout and 1746 (899 duplications and 847 deletions) in people without gout. The distribution of CNVs along each of the chromosomes is shown in Figure 1A. The identified CNVs ranged in size from 5007 to 144 150 872 bp (144 Mb), with an average length of 385.3 kb (Fig. 2 Figure S2 for the 20 most common CNVRs. Comparing length of all types of CNVs between gout and non-gout revealed no significant differences with respect to the average counts and length of CNV (Supplementary Material, Fig. S3 The seven CNVRs were then individually tested for association with gout in the Polynesian sample set. Three provided evidence for association with gout at a nominal level of significance in the fully adjusted model: CNVR1 (OR = 1.72, P = 0.04, Western Polynesian), CNVR6 (OR = 1.86, P = 0.01, meta-analyzed Polynesian) and CNVR9 (OR = 2.75, P = 0.03, Western Polynesian) ( Table 2). Linear regression testing for association with serum urate level in controls provided no evidence for association of any of the three CNVRs, even in the unadjusted models (Supplementary Material, Table S2). Association with gout but not serum urate level is consistent with a possible role of the deletions in the progression from hyperuricemia to gout.

Further validation of CNVR1, CNVR6 and CNVR9
Examination of the CNVR1, CNVR6 and CNVR9 genomic regions in the Database of Genomic Variants (DGV) revealed that these regions are established CNV sites with multiple reported losses and gains (Supplementary Material, Table S3). There are 6, 9 and 37 DGV Gold Standard Variants that overlap with CNVR1, CNVR9 and CNVR6, respectively (Supplementary Material, Table S3).
A total of 91 Polynesian participants with both microarray and 30X whole genome sequencing (WGS) data were used to validate CNVR1, CNVR6 and CNVR9. Read depth of these individuals was plotted from WGS data. The average consistency of the copy number calls in microarray and WGS data was 94% (Supplementary Material, Table S4, 96%, 88% and 97%, respectively), indicating that our CNV calls were reliable. Representative examples of WGS read-depth plots are shown in Supplementary Material, Figure S4.

Association analysis of loci encompassing CNVR1, CNVR6 and CNVR9 with gout in the UK Biobank
We used gout GWAS SNV data in a European dataset to test the CNVR1, CNVR6 and CNVR9 loci for association with gout (UK Biobank; 7131 cases and 325 239 controls). Supplementary Material, Figure S5 presents locus zoom plots for each locus. There was a signal of association in the CNVR1 region, but not CNVR6 or CNVR9. (The lead SNV at CNVR6 with P < 1 × 10 -4 (rs143765601) was uncommon with no gout-associated SNVs in linkage disequilibrium and therefore was considered unreliable.) At CNVR1, there appeared to be two signals of genetic association-one signal was marked by top associated variant rs3016018 and the second marked by rs9265955. We tested each for association with expression of genes within CNVR1 using the Gene and Tissue Expression (GTEx) resource (Supplementary Material, Fig. S6). Rs3016018 is associated with the expression of multiple protein-coding genes outside CNVR1 (e.g. HLA-S,  PSORS1C3, NOTCHC4, C4B); however, rs9265955, which also associated with the expression of multiple genes outside of CNVR1, is also associated with the expression of MHC class I polypeptide-related sequence A (MICA), the single protein-coding gene within CNVR1 (P = 3.6 × 10 −5 ). The gout risk (C) allele of rs9265955 increased the expression of MICA. This evidence from the UK Biobank supports the MICA locus as involved in the etiology of gout.

Physical connectedness within CNVR1
We used both GeneHancer (30) (visualized in the UCSC genome browser), which infers genomic connectedness based on multiple genomic data sources, and HiC physical connectedness data (31) to demonstrate genomic interaction between MICA and long noncoding regulatory RNAs HCP5 and LINC01149 that are both contained within CNVR1 (Fig. 3). This is consistent with a role of these long non-coding RNAs  Additional genetic support for MICA as a candidate causal gene in CNVR1 was obtained from European gout genetic association data and from the GTEx genetic control of gene expression resource. However equivalent data were not obtained for either CNVR6 or CNVR9.
Given the limited power of the Polynesian gout sample set, and the unavailability of a replication cohort, our strategy focused on reducing the statistical impact of multiple testing. Hence, we used an approach that would allow us to identify population-specific CNV rather than a genome-wide approach. The European and Han Chinese sample sets were included to serve this purpose. Future studies can focus on the impact of a wider number of common CNV on gout in these populations, leveraging considerably larger sample sizes.
The locus with the strongest evidence for a role in gout was CNVR1, which is dominated by deletion. Genetic (SNV) associations identified by GWAS for immune-mediated disease have been reported at this locus, including HIV infection (32) and autoimmune diseases such as Grave's disease (33) and systemic lupus erythematosus (34). CNVs that overlap CNVR1 have been reported to be associated with nasopharyngeal carcinoma predisposition (35,36), type 1 diabetes (37) and schizophrenia or bipolar disorder (38) and, now, gout. Recurrent hemizygous deletion at the locus associates with idiopathic pulmonary hemosiderosis, a rare disease without an identified trigger (39). We also provided additional evidence of gout risk in Europeans and gene expression data from GTEx for a role of MICA within CNVR1 in the etiology of gout. The CNV encompasses one protein-coding gene (MICA), which is a strong candidate causal gene. Levels of the MICA protein are reduced in people with the deletion (Fig. 4). MICA encodes a genotoxic stress-induced protein and serves as a ligand for Natural Killer Group 2 member (NKG2D). NKG2D is an immune receptor on the surface of NK cells (40), which binds to immune cells expressing MICA and triggering lysis in these target cells. NK cells can also indirectly activate monocytes and neutrophils through direct interaction or cytokine secretion (41,42). In gout, reduced levels of MICA might result in accumulation of macrophages and an increased immune response to MSU crystals. Urate induces MICA expression via TAK1 promoting NK cell killing and immunosuppression (43)(44)(45). Reduction of urate in an animal model via inhibition of xanthine oxidoreductase activity or gene knockout ablates the genotoxic stress-induced expression of MICA. Additionally, genetic variants that associate with MICA serum levels also associate with inflammatory phenotypes giant cell arteritis, ankylosing spondylitis and hepatitis C and also with DNA methylation in the HLA region (46)(47)(48)(49). In addition to MICA, three non-protein-coding genes map within CNVR1. HCP5 is a lncRNA gene primarily expressed in the immune system and its expression is modulated in chronic kidney disease (50), allergic rhinitis (51) and multiple cancers (52). HCP5 could be an example of an immunepriming lncRNA (53) that perhaps modulates expression of MICA-we were able to show that within CNVR1 there is physical interaction between the HCP5 and MICA genes via the LINC01149 locus (Fig. 3). Also mapping within the deletion are pseudogene HCG26 and non-coding Y RNA ENSG00000199332.1-the functional impact of each is poorly understood. On the basis of current understanding, we speculate that the CNVR1 deletion may increase the risk of gout via perturbed immune activation and surveillance pathways downstream of MICA and control of its expression by regulatory RNAs in the locus.
Despite CNVR1 being prevalent in Eastern Polynesian (15.8% in non-gout), it was not associated with gout in this ancestral group for reasons that are, as yet, unclear. However, we have observed a dichotomy in association with gout between Eastern and Western Polynesian at the ABCG2 locus, despite the gout-associated rs2231142 risk allele being prevalent in both ancestral groups (54). Similarly, the ABCG2 rs10011796 risk allele was associated with tophi in Western Polynesian but not in Eastern Polynesian people (55). The dichotomy may relate to other interacting variants that also differ in frequency between the Eastern and Western Polynesian ancestral groups. Our results underline that not only pan-ancestry but also sub-regional population differences should be taken into account when conducting biomedical genetic research in the Māori and Pacific populations of Aotearoa NZ.
In conclusion, we identified Polynesian-specific CNVs. for association with gout in Polynesian. We identified MICA as a strong candidate causal gene and demonstrated reduced circulating MICA levels in people with deletion at CNVR1. Our study provides new insights into the pathogenic mechanism underlying gout in people of Māori and Pacific ancestry living in Aotearoa NZ.

Participants
The Aotearoa NZ Polynesian sample set (individuals of Aotearoa NZ and Cook Island Māori, Samoan, Tongan, Niuean and Tokelauan ancestry) comprised 1196 participants with and 1249 without gout (Table 3). All people with gout met the 1977 American Rheumatism Association preliminary gout classification criteria (56). The sample set included 171 Māori with and 98 Māori without gout from the rohe (area) of the Ngāti Porou iwi (tribe) of the Tairāwhiti region of Aotearoa NZ. The Aotearoa NZ European sample set (552 participants without gout) and the Han Chinese sample set (57) (1962 participants without gout) were included for comparison purposes to facilitate identification of Polynesian-specific CNV. All ethnicity was self-reported.
The Polynesian sample set was divided into three groups (Eastern Polynesian, Western Polynesian and Mixed Eastern-Western Polynesian

CNV identification and assignment of genotype
Supplementary Material, Figure S9 illustrates the workflow. All genomic locations were derived from NCBI GRCh37/UCSC hg19 coordinates. LRR and BAF values were used to determine copy number. As a normalized measure of total signal intensity, LRR values of 0 represent two copies with lower values in identifying areas of loss and higher values areas of gain.   Table S1).

Statistical analysis
All statistical analysis was performed using R (version 4.0.3). To test for statistically significant differences in the distribution of copy number between cases and controls, regression analyses were implemented within each Polynesian population group. Two models were used: adjustment with age and sex (Model 1); and Model 1 plus adjustment by genotyping batches and PCs 1-10 derived from the genome-wide genotype data as previously described (29)

Validation of CNV calls in WGS and publicly available datasets
Among the Polynesian cohort, 91 individuals with both microarray and WGS data were used to confirm the presence of CNV calls. Each chromosome was divided into 1 kb bins, and read depth of coverage in each bin was calculated by Samtools (63). The read depth of the bins was plotted to confirm the deletions and duplications. The DGV (http://dgv.tcag.ca/dgv/app/home) that includes annotated CNVs was also used for confirmation of CNV.

Gout GWAS in UK Biobank
A total of 332 370 European individuals from the UK Biobank (64) were included under the approval number 12611. The UK Biobank has ethical approval from the North West Multi-Centre Research Ethics Committee (11/NW/0382) and obtained written informed consent from all participants prior to the study. Gout cases were defined as individuals with any of self-reported gout, urate-lowering therapy use (allopurinol or sulfinpyrazone) with no diagnosis of leukemia or lymphoma (ICD codes C81 and C96), primary or secondary diagnosis of gout (ICD code M10) (11). Exclusion criteria were sex chromosome and self-reported sex mismatch, genotype QC failure, relatedness (KING coefficient >0.177) and primary or secondary kidney disease. The GWAS was done using a total of 27 287 012 variants imputed from 845 487 genotyped variants. A logistic regression model was produced for each variant adjusting for age, sex and the first 40 genetic PCs using Plink version 1.9 6.10 (65).

Enzyme-linked immunosorbent assay
A Human MICA ELISA Kit (Invitrogen, Thermo Fisher Scientific, Waltham, Maryland, USA) was used to detect soluble MICA (following the manufacturer's instructions) in serum that had been stored at −80 • C. Absorbance was measured at 450 nm. A standard curve of the logarithmic relationship between concentration and absorbance was used to calculate the concentration of soluble MICA in serum samples. All people homozygous for the deletion (CN = 0) were selected for measuring of MICA serum levels. For each of CN = 1 and CN = 2, we began by randomly selecting 30 gout and 30 non-gout individuals (i.e. 60 for each genotype). However, after accounting for unavailability of serum or insufficient serum, the eventual numbers assayed were 49 for CN = 1 and 43 for CN = 2. Association data across the CNVR1, CNVR6 and CNVR9 were specifically queried from these GWAS data.